DEVELOPMENT OF A DISTRIBUTED BIG DATA FUSION ARCHITECTURE FOR MACHINE-TO-MACHINE COMMUNICATION USING ENSEMBLE LEARNING
ABSTRACT
This research developed a distributed big data fusion architecture for machine to machine communication using ensemble learning. This is implemented to mitigate the challenges that characterize centralized big data fusion architecture commonly adopted through the use of Hadoop MapReduce platform. These challenges include bandwidth consumption, latency, and high computational cost. Fog computing technique approach was adopted through the implementation of ensemble learning; feature engineering was implemented to extract information (pixel values, number of layers (nlayers), number of cell (ncell), number of row (nrow), and coordinates) from the data, water bodies and vegetation index (NDWI and NDVI) were calculated. The extracted information was used as a training dataset for both centralized and distributed architecture using adaboost as bases of comparison between centralized and distributed architecture. Performance evaluation was based on Bandwidth consumption and Latency. Results were presented in the form of confusion matrix. The developed architecture achieved a 31.44 minutes and 1.9% improvement in latency and accuracy between the centralized and the distributed architecture respectively. The result also showed 5.8% and 4.81 minutes improvement in accuracy and latency were recorded in performance comparison of base learner and ensemble Adaboost